【Reading】Python for Data Analysis Part 1 Chapter 1-8

Book
Author

Tony Duan

Published

December 29, 2022

https://wesmckinney.com/book/python-basics.html

Python for Data Analysis by Wes Mckinney

{exec,command='bash'} echo 'hello'

1 Preliminaries

Essential Python Libraries: NumPy/pandas/matplotlib/SciPy/scikit-learn/statsmodels

install package

Code
import os
os.system('pip install numpy')
os.system('pip install matplotlib')
os.system('pip install pandas')
os.system('pip install seaborn')
os.system('pip install statsmodels')

Import Conventions

Code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm

2 Python Language Basics, IPython, and Jupyter Notebooks

Code
print("Hello world")
Hello world
Code
import numpy as np
data = [np.random.standard_normal() for i in range(7)]
data
[0.7810594371163568,
 -0.7334955362847929,
 0.5167768619691568,
 0.9043756983157261,
 -1.243885234931727,
 0.16636014847333624,
 1.1251843432345972]

An important characteristic of the Python language is the consistency of its object model. Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box,” which is referred to as a Python object. Each object has an associated type (e.g., integer, string, or function) and internal data.

number and string

Code
v1=123
v2='abc'
type(v1)
type(v2)
str

list:

Code
a = [1, 2, 3]
a
a.append(4)
a
type(a)
list

Control Flow:f, elif, and else

Code
x = -5
if x < 0:
    print("It's negative")
It's negative

for loops

Code
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue
    total += value

while loops

Code
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    total += x
    x = x // 2

pass

Code
if x < 0:
    print("negative!")
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print("positive!")
positive!

range

Code
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Code
list(range(0, 20, 2))
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

3 Built-In Data Structures, Functions, and Files

Tuple

Code
tup = (4, 5, 6)
tup
(4, 5, 6)

it start with 0 position

Code
tup[0]
4

a tuple of tuples:

Code
nested_tup = ((4, 5, 6), (7, 8))
nested_tup
((4, 5, 6), (7, 8))

!!!once the tuple is created it’s not possible to modify!!!

list

Code
a_list = [2, 3, 7, None]
a_list
[2, 3, 7, None]
Code
gen = range(10)
list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

append element at the end

Code
a_list.append("dwarf")
a_list
[2, 3, 7, None, 'dwarf']

insert element by position

Code
a_list.insert (1,"new")
a_list
[2, 'new', 3, 7, None, 'dwarf']

Dictionary

Set

Functions

Code
def function_without_return(x):
    print(x)
    
function_without_return('hello')
hello

Errors and Exception Handling

if error then return except

Code
def attempt_float(x):
    try:
        return float(x)
    except:
        return x
Code
attempt_float("1.2345")
1.2345
Code
attempt_float("something")
'something'

Files and the Operating System

4 NumPy Basics: Arrays and Vectorized Computation

Code
import numpy as np
data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
data
array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])
Code
data * 10
array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]])
Code
data.shape
data.dtype
dtype('float64')

5 Getting Started with pandas

Code
import numpy as np
import pandas as pd
Code
data = {"state": ["Ohio", "Ohio", "Ohio", "Nevada", "Nevada", "Nevada"],
        "year": [2000, 2001, 2002, 2001, 2002, 2003],
        "pop": [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
frame = pd.DataFrame(data)

frame
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
5 Nevada 2003 3.2
Code
frame.head()
state year pop
0 Ohio 2000 1.5
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
Code
frame.tail()
state year pop
1 Ohio 2001 1.7
2 Ohio 2002 3.6
3 Nevada 2001 2.4
4 Nevada 2002 2.9
5 Nevada 2003 3.2
Code
frame["state"]
0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object
Code
frame.state
0      Ohio
1      Ohio
2      Ohio
3    Nevada
4    Nevada
5    Nevada
Name: state, dtype: object

6 Data Loading, Storage, and File Formats

7 Data Cleaning and Preparation

8 Data Wrangling: Join, Combine, and Reshape

Reference

https://wesmckinney.com/book/